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We investigate the time and space complexity of detecting and preventing ABAs in 
shared memory algorithms for systems with n processes and bounded base objects. To 
that end, we define ABA-detecting registers, which are similar to normal read/write 
registers, except that they allow a process q to detect with a read operation, whether 
some process wrote the register since q's last read. ABA-detecting registers can be 
implemented trivially from a single unbounded register, but we show that they have 
a high complexity if base objects are bounded: An obstruction-free implementation of 
an ABA-detecting single bit register cannot be implemented from fewer than n — 1 
bounded registers. Moreover, bounded CAS objects (or more generally, conditional 
read-modify-write primitives) offer little help to implement ABA-detecting single bit 
registers: We prove a linear time-space tradeoff for such implementations. We show 
that the same time-space tradeoff holds for implementations of single bit LL/SC prim¬ 
itives from bounded writable CAS objects. This proves that the implementations of 
LL/SC/VL by Anderson and Moir [2] as well as Jayanti and Petrovic [15] are optimal. 

We complement our lower bounds with tight upper bounds: We give an implemen¬ 
tation of ABA-detecting registers from n + 1 bounded registers, which has step com¬ 
plexity 0(1). We also show that (bounded) LL/SC/VL can be implemented from a 
single bounded CAS object and with O(n) step complexity. Both upper bounds are 
asymptotically optimal with respect to their time-space product. 

These results give formal evidence that the ABA problem is inherently difficult, that 
even writable CAS objects do not provide significant benefits over registers for dealing 
with the ABA problem itself, and that there is no hope of finding a more efficient 
implementation of LL/SC/VL from bounded CAS objects and registers than the ones 
from mm- 
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1. Introduction 


Since the beginning of shared memory computing, programmers and researchers have had to deal 
with the ABA problem: Even though a process retrieves the same value twice in a row from a 
shared memory object, it is still possible that the value of the object has changed multiple times. 

Especially algorithms using the standard Compare-and-Swap (CAS) primitive seem to be suscep¬ 
tible. A CAS object provides two operations: Read() returns the value of the object, and CAS ix,y) 
changes the value of the object to y provided that its value v prior to the operation equals x, and 
it returns v. (According to some specifications a CAS ix,y) returns a Boolean, which is True if and 
only if the CAS() succeeded, i.e., it wrote y.) Often, CAS() objects are used in the following way: 
First, a process p reads the value x stored in the CAS object, then it performs some computation, 
and finally it tries to propagate the result of the computation by performing a CAS (x,y~). The idea 
is that if another process has already updated the data structure, p’s CAS() should fail, and so 
inconsistencies are avoided. However, if multiple successful CAS() operations have occurred and 
the value of the object has changed back to x, p’s CAS() might still succeed, possibly yielding 
inconsistencies. 

ABAs are also a problem for algorithms using other strong primitives, or even only registers. For 
example, in mutual exclusion algorithms often processes busy-wait for certain events to happen, by 
repeatedly reading the same register. In systems with caches, the cost of waiting is small, because 
as long as no process changes the register value, all reads are cache hits. The event is signaled by 
other processes through a change in the register value. But it may also be desirable to eventually 
reset the register to its state, before the event was signaled, in order to be able to reuse it. But 
this may result in the ABA problem, and as a consequence waiting processes may miss events. 
Therefore, algorithm designers have to devise more complicated code in order to avoid unnoticed 
cache misses, or even lack of progress. 

Many shared memory algorithms and data structures have to deal with the ABA problem. Often 
this is done in an ad-hoc, application specific way m, or solutions are based on tagging mm 
[23h25([2TU29j (see below). Other papers combine tagging and memory management techniques, or 
suggest both as alternatives mm- 

Tagging, introduced by IBM [14| . requires augmenting an object with a tag (which is sometimes 
called sequence number) that gets changed with every write operation. This technique avoids the 
ABA problem only, if tags never repeat. Therefore, theoretically, an infinite number of tags and 
thus base objects of unbounded size are required. One may argue that, in practice, for reasonably 
large base objects, a system will never run out of tags. However, this is unrealistic in cases where the 
tag has to be stored together with other information in the same object. In some cases, it is possible 
to store the tag in a separate object (e-g., [ED, but this requires technically difficult algorithms and 
tedious correctness proofs. Some architectures like the IBM System/370 [14] introduced a double¬ 
width CAS primitive, which allows one of two (32-bit) words to be used for storing tags. While using 
bounded tags does not completely avoid the ABA problem (because tag values may wrap around), 
it has been argued [Mll25[ l28 ] l29] that an erroneous algorithm execution due to an unexpected 
ABA becomes very unlikely. From a theoretical perspective this is unsatisfactory. Moreover, for 
practical applications, it is often necessary to use the entire object space (today usually comprising 
64 bits) for data, so the tagging technique requires double-width atomic instructions. Those are 
not supported by most mainstream architectures m- 
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ABAs cause problems in algorithms that use some form of memory management, where a pointer 
to some memory space may change its value in an ABA fashion. In this context, memory recla¬ 
mation techniques based on reference counting [32], Hazard pointers [20) 121]. the repeat-offender 
problem technique m, or the memory reclamation technique introduced in [I] deal with the ABA 
problem. But those techniques are application specific. 

A more methodological approach has been followed by research that showed how a load-linked 
store-conditional (LL/SC) object can be implemented from CAS objects and registers. Such an 
object provides two operations, LL() and SCO, where LL() returns the current value of the object. 
SC(x) may either fail and not change anything, or succeed and write the value x to the object. 
Specifically, an SC(x) operation by process p succeeds if and only if no other SCO operation 
succeeded since p’s last LL(). A Boolean return value of an SCO operation indicates its success 
(True) or failure (False). An extended specification also allows for a VL() (verify-link) operation, 
which does not change the state of the object, but it returns False if a successful SCO has been 
performed since the calling process’ last LL(), and True otherwise. LL/SC (or LL/SC/VL) objects 
can in almost all cases replace CAS objects in algorithms, and are an effective way of avoiding 
the ABA problem. Unfortunately, existing multiprocessor systems only provide weak versions of 
LL/SC that restrict programmers severely in how they can use the objects [26) , and hence they 
“offer little or no help with preventing the ABA problem” [22j . 

For that reason, a line of research has been dedicated to finding time and space efficient LL/SC 
implementations from CAS objects and registers ]2lf5Tll5lll6ll20ll221l26j . While many of those solutions 
are wait-free and often even guarantee constant time execution of each LL() and SCO operation, 
they still have drawbacks: Existing implementations either require unbounded tags (e.g., [26]) and 
thus use unbounded CAS objects or registers, or they need at least linear space. Jayanti and 
Petrovic |15j and Anderson and Moir [2] presented the most space efficient implementations of an 
LL/SC object from bounded CAS and registers, which achieve constant step-complexity: they use 
only one CAS object but require 0(?r) registers. This raises the question, whether time efficient 
implementations of LL/SC from a smaller number of bounded CAS objects and registers may exist. 
More generally, in order to understand the power and limits of shared memory primitives, it seems 
important to learn how much time and space is required to avoid or detect ABAs, and not to 
restrict this question to the implementation of LL/SC objects from CAS objects and registers. 

CAS and LL/SC objects have a consensus number of infinity [11] . while registers have a consensus 
number of one. Therefore, it is impossible to implement wait-free LL/SC from registers or other 
objects with a bounded consensus number. Time and space lower bounds for implementations of 
LL/SC objects may not necessarily imply that it is the ABA problem that is hard to solve, but 
such lower bounds may follow inherently from other properties of the LL/SC specification. 

Results. To investigate the complexity of detecting or preventing ABAs, we define a natural 
object, the ABA-detecting register. It supports two operations, DReadO and DWriteQ. Operation 
DWrite(x) writes value x to the register, and returns nothing. Operation DReadO by process p 
returns, in addition to the value of the register, a Boolean flag, which is True if and only if some 
process executed a DWriteO since p’s last DReadO operation. We distinguish between single- 
writer ABA-detecting registers, where only one dedicated process is allowed to call DWriteO, and 
multi-writer ones that don’t have this restriction. 

A wait-free ABA-detecting register can be implemented from registers, and thus has consensus 
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number 1. (Therefore, they are weaker with respect to wait-freedom than CAS or LL/SC.) Using a 
single unbounded register with an unbounded tag that gets changed whenever some process writes 
to it, it is trivial to obtain an ABA-detecting register with constant time complexity. But if base 
objects have only bounded size, the situation is completely different: For implementations of ABA- 
detecting registers in a system with n processes and bounded registers, we obtain a linear (in n) 
space lower bound, even if the implementation satisfies only nondeterministic solo-termination (the 
non-deterministic variant of obstruction-freedom), which is a progress condition strictly weaker than 
wait-freedom. The availability of CAS seems to be of little help: For wait-free implementations from 
CAS objects and registers we obtain a time-space tradeoff that is linear in n. The same asymptotic 
time-space tradeoff is obtained if the base objects support arbitrary conditional read-modify-write 
operations f?]. Each conditional operation can be simulated by a single operation on a writable 
CAS objects, i.e., an object that supports a Write () operation in addition to ReadQ and CAS(). 
For that reason we state the lower bound for implementations from conditional read-mo dify-write 
operations in terms of writable CAS base objects. 

Theorem 1. Any linearizable implementation of a single-writer 1-bit ABA-detecting register from 
m bounded base objects satisfies: 

(a) m > n — 1 if the base objects are bounded registers, and the implementation satisfies nonde¬ 
terministic solo-termination; 

(b) m > (n — 1 )/t, if the the base objects are bounded CAS objects and registers, and the imple¬ 
mentation is deterministic and wait-free with worst-case step-complexity at most t; and 

(c) m> (n — l)/(2t), if the base objects are bounded writeable CAS objects, and the implementa¬ 
tion is deterministic and wait-free with worst-case step-complexity at most t. 


The requirement that base objects are bounded is necessary for this lower bound, because, as 
mentioned earlier, an ABA-detecting register can be trivially obtained by augmenting a normal 
register with an unbounded tag. 

There is a simple implementation of a (bounded) ABA-detecting registers with constant step- 
complexity from a single (bounded) LL/SC/VL object of the same size: Each process uses a local 
variable old. To DWriteCr), the process executes a LL() operation followed by a SC(x). To 
DReadO, the process first executes a VL(). If VL() returns True, the process returns (old, False); 
otherwise, it reads the value of the LL/SC/VL object into old (by executing LLQ), and then returns 
(old, True). It is not hard to see that this implementation is linearizable. (See Appendix lAl for 
the algorithm an d proof of c orrectness.) Thus, by reduction we obtain the same lower bound as 
the one stated in lThcorcm ll for implementations of single bit LL/SC/VL. Unfortunately, for that 
reduction the VL() operation is needed, and at least we do not know how to obtain a similarly 
efficient ABA-detect ing register from an LL/SC object that does not support VL(). However, the 
proofs of lTheorem ll can be easily modified to accommodate LL/SC objects: 


Corollary 1. Any linearizable implementation of a single bit LL/SC object from m bounded base 
objects satisfies 

(a) m> (n — l)/t, if the the base objects are bounded CAS objects and registers, and the imple¬ 
mentation is deterministic and wait-free with worst-case step-complexity at most t; and 

(b) m> (n — l)/(2t), if the base objects are bounded writeable CAS objects, and the implementa¬ 
tion is deterministic and wait-free with worst-case step-complexity at most t. 
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A linear space lower bound (corresponding to Part (a) of lTheorem il l for nondeterministic solo- 
terminating implementations of LL/SC from (even unbounded) registers follows from the fact that 
LL/SC objects are p erturbable [T7j . 


As in iThcorcm ll . the assumption that base objects are bounded is necessary, because there is 


an implementation of an LL/SC/VL object from a single unbounded CAS object with constant 
step complexity by Moir [26]. Our time-space tradeoff is asymptotically tight for implementations 
with constant step-complexity, as it matches known upper bounds mm- We show that it also 
asymptotically tight for implementations using a single CAS object: 


Theorem 2. A single bounded CAS object suffices to implement a bounded LL/SC/VL object or 
a bounded multi-writer ABA-detecting register with 0(n ) step-complexity. 


These results raise the question, whether bounded CAS objects are helpful for ABA detection. 
We determine that for this problem bounded CAS objects do not provide additional benefits over 
bounded registers: 


Theorem 3. There is a linearizable wait-free implementation of a multi-writer b-bit ABA-detecting 
register from n + 1 (b + 2 logn + 0(1 ))-bit registers with constant step complexity. 

Not only do our lower bounds show that Anderson’s and Moir’s [2] as well as Jayanti’s and 
Petrovic’s m implementations of LL/SC from CAS objects and registers are optimal with respect 
to their time- and space-product, but they also clearly indicate that ABA detection is inherently 
difficult, even if bounded conditional read-modify-write primitives such as (writable) CAS objects 
are available. Therefore, other primitives that provide a solution to the ABA problem would most 
likely be as difficult to obtain as LL/SC. Our upper bounds demonstrate that bounded CAS objects 
(and in fact any conditional read-modify-write operations) are not more helpful than bounded 
registers with respect to ABA detection. On the other hand, ABA detection is difficult only if base 
objects are bounded, but for our lower bounds it does not matter how large that bound on the size 
of the base object is, as long as it is finite. 


Other Related Work. Our lower bounds use covering arguments. Covering arguments were first 
used by Burns and Lynch [4] to prove a space lower bound for mutual exclusion, and essentially 
all space lower bounds are based on this technique. Examples are space lower bounds for one¬ 
time test-and-set objects {30|, consensus [8], timestamps [5119], and the general class of perturbable 
objects fTT] (which includes LL/SC among others). These lower bounds have in common that they 
do not apply if CAS objects are available as base objects. (They allow for registers, swap objects, 
and, in case of HZ], resettable consensus.) An overview of covering arguments can be found in 
Attiya’s and Ellen’s recent textbook [3]. 

In our time-space tradeoffs we construct executions, where a sequence of operations by a process 
p is interleaved with successful CAS() and Write () operations of other processes, so that p’s steps 
remain “hidden”. Such a technique has been also used by Fich, Hendler, and Shavit |7J to prove 
linear space lower bounds for wait-free implementations of visible objects implemented from con¬ 
ditional read-modify-write (i.e., writable CAS) objects. Visible objects include counters, queues, 
stacks, or snapshots. Neither ABA-detecting registers nor LL/SC objects are visible, because they 
can be implemented from a single unbounded CAS object. In fact, we are not aware of any other 
non-trivial lower bounds that, like ours, separate bounded from unbounded base objects. 
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Preliminaries. We consider a system with n processes with unique IDs in V = {0,... ,n — 1}. 
Processes communicate through shared memory operations, called steps, that are executed on 
atomic base objects provided by the system. Each process executes a possibly nondeterministic 
program. If processes are deterministic, a schedule is a sequence of process IDs, that determines 
the order in which processes execute their steps. If processes are nondeterministic, a schedule is a 
sequence of process IDs together with coin-flips, and it describes the order in which processes take 
steps together with the nondeterministic decisions they make. The sequence of shared memory 
steps taken by processes is called execution. A history on some implemented object is the sequence 
of method call invocations and responses that occur in an execution on that object. A configuration 
describes the state of the system, i.e., of all processes and all base objects. 

Our implementations are deterministic and wait-free, which means that every method call termi¬ 
nates within a finite number of the calling process’ steps, in any execution. The step-complexity of 
a deterministic wait-free method is the maximum number of steps a process needs to terminate the 
method call in any execution. Our lower bounds hold for implementations that satisfy a progress 
condition which is strictly weaker than wait-freedom: A nondeterministic method m satisfies non¬ 
deterministic solo-termination , if for every process p and every configuration C in which a call 
of method m by p is pending, there is a p-only execution that starts in C and during which p 
finishes method m. For deterministic algorithms, nondeterministic solo-termination is the same 
as obstruction-freedom. Our algorithms are linearizable m, but our lower bounds work for much 
weaker correctness conditions. 


2. Lower Bounds 

For a configuration C and a schedule a, let Exec(C, a) denote the execution arising from processes 
taking steps, starting in configuration C, in the order defined by a, and using the nondeterministic 
decisions defined by a, if the algorithm is nondeterministic. Let Conf(C, a) denote the configuration 
resulting from execution Exec(C, a) started in C. For two configurations C and D and a schedule 
a, we write C D to indicate that Conf(C, a) = D. Let Ci n a denote the initial configuration. 
If there exists a schedule a such that C D, then we say D is reachable from C , and if D is 
reachable from C in a , we simply say D is reachable. 

An execution £ or a schedule a is P-only for a set P C {0,..., n — 1} of processes, if only 
processes in P take steps during E respectively a. If P = {p} is the set of a single process, then 
we sometimes write p-only instead of {p}-only. 

For an execution E, let -<e denote the happens-before order on operations in E, i.e., if operation 
op responds in E before op' gets invoked, then and only then op -<e op' (op happens before op'). 
We write simply -< instead of -<e, if is clear from the context which execution E the relation refers 
to. For a schedule a, an execution E and a process p, E\p and a\p denote the sub-sequences of 
steps by p in E and in a, respectively. 

Two configurations C and D are indistinguishable to process p, if every register has the same 
value in C as in D, and p is in the same state in both configurations. We write C ~ p D to denote 
that C and D are indistinguishable to p. We write C D for a set S of processes to denote that 
C ~ p D for every process p € S. We say process p is idle in configuration C, if it has no pending 
method call, and if all processes are idle, then the configuration is quiescent. A process completes 
a method call in an execution E, if that method terminates in E. 
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For our lower bounds, we do not require that the implementation of the ABA-detecting registers is 
linearizable. Instead, we consider methods WeakReadO and WeakWriteQ that take no arguments, 
and where WeakReadO returns a Boolean value, and WeakWriteO returns nothing. A correct 
concurrent implementation of these methods must guarantee for every execution, that a WeakReadO 
operation r by process p returns True if and only if there exists a WeakWriteO operation w such 
that w happens before r and every other WeakReadO operation by p happens before w. 

Linearizability of an ABA-detecting register R guarantees that the operations P.DReadO (in 
place of WeakReadO) and P.DWriteO (in place of WeakWriteO) satisfy the correctness properties 
above. Therefore, every lower bound on the time and/or space complexity for correct implementa¬ 
tions of those methods implies the same lower bound for linearizable ABA-detecting registers. 

Let p be some process and C a configuration. We say C is p-clean, if there exists a schedule a, 
Cinit C, such that Exec (Cput, a) contains a complete WeakReadO operation r* by p, and every 
WeakWriteO happens before r*. Configuration C is p-dirty, if there exists a schedule a, Ci n u C, 
and Exec(C'j n jt, a) contains a complete WeakWriteO operation w* such that no WeakReadO by p is 
pending at any point after w* has been invoked. Note that some configurations are neither p-dirty 
nor p-clean. 

Throughout this section we assume that each process executes an infinite program, in which it 
repeatedly calls WeakReadO and WeakWriteO methods. More specifically, process 0 repeatedly 
executes WeakWriteO, while every process in {1, ... ,n — 1} repeatedly calls WeakReadO. 

Then in a p-only execution starting from a configuration C, the first WeakReadO operation by 
p returns False if C is p-clean and True if C is p-dirty. Therefore, each process must be able to 
distinguish p-clean configu rations from p-dirty ones. The full proof of the following observation can 


be found in I Appendix B.l 


Observation 1 . Suppose the WeakReadO method satisfies nondeterministic solo-termination. For 
any process p G {1,..., n — 1} and any two reachable configurations C\, C 2 , if C\ is p-clean and C 2 
is p-dirty, then C\ C 2 . 


2.1. A Space Lower Bound for Implementations from Bounded Registers 

Let 71 be a set of k registers and P a set of processes. We say the processes in P cover 7Z in 
configuration C. if for each register R G 1Z there is a process in P that is poised to write to R. A 
block-write to 1Z is an execution in which k processes participate, and each of them takes exactly 
one step in which it writes to a distinct register in TZ. (The only block-write to 0 is the empty 
execution.) 

In the following we assume an implementation of methods WeakReadO and WeakWriteO from 
m bounded registers. The register configuration of a configuration C is an ?n-tuple, reg(C) = 
(vi ,..., v m ), where Vi is the value of the z-th register. 


Lemma 1. Suppose methods WeakReadO and WeakWriteO satisfy nondeterministic solo- 
termination. For any quiescent configuration Q and any set P = {pi,..., pp} C V \ {0}, where 
k G {0,..., n — 1}. there exists a (P& U {0 })-only schedule a such that in Conf(Q, a) process 0 is 
idle and k distinct registers are covered by p\,... ,pp 


Lemma 1 immediately implies Theorem 3(a). 
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set Ri of k registers is covered 

and process 0 is idle re g (A) = reg {Dj) 



Pfc+i-clean; + p^+i-dirty; 


Figure 1: Proof of iLemma ll . Let P® denote the set P% U {0}. Double circles denote quiescent 


configurations. 


Proof of iLemma j . The proof is by induction on k. If k = 0, we let a be the empty schedule, and 
the lemma is immediate because Conf(Q, a) = Q is a quiescent configuration (so 0 is idle). 

Now suppose we have proved the inductive hypothesis for some integer k < n — 1. Let /3 = 
(pi ,... be the schedule in which each of pi,... ,pk takes exactly one step. Let Q o = Q. By 
the inductive hypothesis there is a schedule oq such that in C\ := Conf(Qo> «i) a set 7Z\ of exactly 
k registers is covered, and process 0 is idle. Hence, Exec(C'i,/3) is a block-write to 7Z\ yielding a 
configuration D\ = Conf(Cq,/3). We let 71 be the schedule such that in Exec(Di, 7 i) first each 
process in {pi,... ,p&} takes enough unobstructed steps to finish its WeakReadO method call, and 
after that process 0 takes enough unobstructed steps to complete exactly one WeakWriteO method. 
Then Q\ = Conf(Di, 7 i) is quiescent, and during Exec(Di, 7 i) exactly one complete WeakWriteO 
gets executed. Repeating this construction (using the inductive hypothesis repeatedly) we obtain a 
schedule ai(3'yia2 / 372«3 • • • and configurations Qo , C\, D\,Q\, C 2 , D 2 , Q 2 , ■ ■ ■ and sets of k registers 
711,712, ■ ■ ■, such that for any i > 1 : 


• Q,;_ i Qn 

• Qi is quiescent; 

• during Exec(Dj, 7 j) process 0 executes a complete WeakWriteO operation; and 

• in Ci process 0 is idle and 71, is covered by Pk (and thus Exec(Cj,/3) is a block-write to TZi). 
Since the number of registers is finite, and all registers are bounded, there exist indices 1 < i < j 
such that reg(A) = reg {Dj). Let a = qjaj+i/Tyj+ia^ ... aj/3, i.e., 


C, A D, A D y 


(1) 

















This situation is depicted in [Figure ll . Now let A' be a pfc+i-only schedule such that in Exec(C), A') 
process Pk+i completes exactly one WeakReadO method call. By the nondeterministic solo- 
termination property, such a schedule X' exists. Let A be the prefix of A', such that Exec(C),A) 
ends when pk+i is poised to write to a register R 0 IZi for the first time, or A = A' if Pk+i finishes 
its WeakReadO method call without writing to a register outside of 7 Zi. 

First assume A 7 ^ X' , i.e., in Exec (Ci, A) process Pk+i does not finish its WeakReadO method call, 
but instead the execution ends when Pk+i covers a register R 0 7Zi. Since in Q process 0 is idle and 
7 Zi is covered by Pk , and since A is p^+i-only, in configuration Conf(C), A) = Conf(Q, aip'ji ■ ■ ■ ct^A) 
processes pi, ■ ■ ■ ,Pk +1 cover k + 1 registers, and process 0 is still idle. This completes the proof of 
the inductive step for a = au/Lyi... a^A. 

Now we consider the case A = A', i.e., during Exec(C),A) process Pk+i finishes its WeakReadO 
method call without writing to a register outside of IZi. To complet e the pro of of the lemma, it 
suffices to show that this case cannot occur. (This case is depicted in lFigurc~il .l 

Since in Ci the processes in Pk cover IZi, and Pk+i only writes to registers in IZi during Exec(C), A), 
it follows that Exec (Ci, A/3) ends with a block-write by Pk in which all writes by Pk+i get obliterated. 
In particular, for D[ := Conf(C), A/3) we have 


D\ ~ Di 

■p\{Pk+ 1 } 


( 2 ) 


Hence, since schedule a is (Pk U {Oi l-only, i. e., Pk+i does not participate, we obtain Exec(7A',cr) = 
Exec(ZA, a), and in particular using lEa. (ll 


Ci% Lt-Z+D';, where £>-■ - 

3 r\{ Pk+ 1} J 


D\. 


(3) 


Now recall that we chose i and j in such a way that reg(TAj) = reg(TAj). Thus, from lEq. (21 


and [{3)| we get reg(TA') = reg (Di) = veg(Dj) = reg(TA'). Because D( D’- (|Eq. 13)1) , and since by 
construction only processes {0,pi,... ,pk} appear in a , Pk+i is in D\ in exactly the same state as 


in D'-. Hence, 


D' 


Pk +1 


D ’r 


(4) 


Now recall that Ci ^ and in the corresponding execution process Pk+i executes a complete 
WeakReadO method, while process 0 takes no steps, and Pk+i is idle in D(. Hence, D\ is Pk+i- 
clean. On the other hand, Exec(I7(, a) = Exec(77j, a) starts with a complete WeakWriteO operation 
(during the prefix Exec(D', 7 j)) by process 0, while process Pk+i takes no steps, and thus remains 
idle. It follows that the configuration resulting from that execution, D(, is Pfc+i-dirty. Summarizing, 
we have two reachable configurations, D[- and D (, where one of them is Vk+i -cl ean and the other 
one is Pk+i -di rty, and both are indistinguishable to Pk+i, according to lEa. (4)1 . This contradicts 
Observation 1. □ 


2.2. A Time-Space Tradeoff for Implementations from CAS Objects 

We now consider deterministic wait-free implementations of WeakReadO and WeakWriteO from m 
writable bounded CAS objects. We assume without loss of generality that every CAS (x,y) operation 
satisfies x 7 ^ y. (A CAS(x,x) operation can be replaced by a ReadO). 
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For any configuration C and any shared CAS object R let CCo v(C,R) and WCo v(C,R) denote 
the sets of processes that are poised in C to execute a CAS() respectively Write () operation on 
R. Let P be a set of processes and C a configuration. A schedule fi is called P-successful for C, 
if it contains every process in P exactly once, and every step of Exec(C, fi) is either a Write () or 
a successful CAS(). If a configuration C has a P-successful schedule fi, then we also say execution 
Exec (C,fi) is P-successful. 

As before, we assume that all processes run an infinite loop, where process 0 repeatedly calls 
WeakWriteQ while all other processes repeatedly call WeakReadO. 

Lemma 2. Let P C V \ {0}, q G V \ P, q fi 0. Let C be a configuration, in which either q is 
idle, or in no execution starting from C process q executes more than t shared memory steps before 
finishing a pending WeakReadO call. If (5 is a P-successful schedule for C, then there is a schedule 
a such that 

Conf(C, (3) ~ Conf(C», (5) 

V\{q } 

and at least one of the following is the case: 

(a) In Conf((7, a) process q is idle; 

(b) in Conf(C, <r) process q is poised to write to some object R and |WCov(C, R) fi P| < t; or 

(c) in Conf(C, a) process q is poised to execute a CAS (x,y) operation on some object R, where x 
is the value of R in configuration Conf (C, a), and |WCov(C, R) fi P| + |CCov(C, R) D P| < t. 


Proof. We pro ve the lemma by induction on t. If t = 0, then q is idle in C. Hence, for a = /3 we 
obtain lEo. (5)| and Case (a). 

Let opq be the step q is poised to execute in C, and let V be the object affected by op q . Further, 
let valc(V) denote the value of V in configuration C. 

Case 1: First, assume that op q is a Read() or a CAS() operation. Let z be the first process in P 
that executes a step in Exec(C', fi)\V, and let op z be that step. We construct a two-step schedule 
A that contains q and z, such that 


C' := Conf ((7, A) ~ Conf (C,z) 
r\{q} 


( 6 ) 


First suppose op z is a CAS (a, b) operation and op q a CAS (x,y) operation that would succeed in 
C (i.e., x = valc(V)). Then we define A = (z,q). Since fi is P-successful, the CAS (a, 6) by z in 
configuration C succeeds and changes the val ue of V from a to b. In this case, x = a, so in the 
execution Exec(C, A) the CAS (x,y) by q fails. lEq. (6)1 follows. 

In all other cases (i.e., if op z is a Write() operation or op q is a Read() or a CAS() that fails in 
C), then we let A = (q,z). Then in Exec(C, A) either operation op q does not change the value of 
object V, or o p z execu tes a Write() and overwrites any changes that may have resulted from op q . 


It follows that lEq . (6) 


is true. 

Now let fi' = fi\(P \ {z}), and recall that C' = Conf(C, A). 


Since fi is P-successful and in 
Exec(C, fi) process z executes the first step on V, it follows that fi' is P-successful in C'. 

Hence, we can apply the inductive hypothesis for configuration C', process set P' = P\{z}, and 
schedule fi', to obtain a schedule o' that satisfies one of the Cases (a)-(c). Let er = A o o'. Then 
by construction, Conf(C, o) = Conf(C' / , o') ~p\{ 9i2 } Conf^C', fi 1 ) = Conf(C, fi). Bec ause oflEo. (61 


process z can also not distinguish between Conf (C, o) and Conf ((7, fi), so we obtain lEq. (5)1 . If (a) 
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of the inductive hypothesis applies for C and a ', then the same also applies for C and a, because 
Conf((7, a) = Conf(C / , a 1 ). Now suppose that Case (b) applies for C and o'. Let R be the object 
on which process q is poised to execute a Write() in Conf(C", a') = Conf(C, a). Starting from 
configuration C'. process q must finish its WeakReadO method within t' = t — 1 steps. Hence, 
|WCov(C", R) Pi P '| < t'. Since all processes other than z are poised to execute the same step in C 
as in C', we have |WCov(C, R) D P| < |WCov(C", R) fl P'\ + 1 < t' + 1 = t. Hence, Case (b) follows 
for C. a and P. With exactly the same argument, if Case (c) applies for C", a ', and P', then it 
also applies for C , a, and P. 

Case 2: We now assume that in C process q is poise d to exe cute a Write() operation op q on 
object V. If |WCov(C, V) PI P\ < t. we let a = /3. Then Eq. (5) and Case (b) (for R = V) of the 
lemma are trivially satisfied. 

Hence, assume that |WCov(C, V) flP| > t. Then Exec(C, (3) contains at least t writes to V. Let 
zi,... ,Z £-1 be the processes accessing V in this order in Exec(C, f3)\V before the first write to V 
occurs, and let Z( be the first process writing to V. Let Z = {^i,..., z^j, A = (z\,.. ., Z£^i,q, zg), 
7 = /3\Z = (z \,..., zg), ft = /3\(P\ Z) and P' = P\Z. Then in Exec(C, 7 o j3 r ) all processes in P 
execute exactly one step, as they do in Exec (C,/3), and for each object U the steps executed on U 
occur in the same order in both executions. Hence, processes cannot distinguish these executions 
from each other, and in particular 


Conf(C, 7 o f3') = Conf((7, j3). 


(7) 


In Exec(C, A), first processes z\,..., zg_\ execute successful CAS operations on V, then q writes to 
V, and finally, zg overwrites what q has written. It follows that 


C' := Conf(C, A) ~ Conf(C, 7 ). 

nw 


( 8 ) 


Combining this with lEa. (7)1 we obtain that (3' is PLgucceggfcd in C. Moreover, since q executed 
one step in the execution leading from C to C', in any execution starting from C' it finishes its 
WeakReadO method after at most t' = t — 1 steps. Thus, we can apply the inductive hypothesis 
to obtain a schedule & such that Conf (C',/3') ~ 7 >\{ g } Conf(C", a'), and one of Cases (a)-(c) holds. 
Let a = A o a'. Then Co nf(C, a ) = Conf (Ch a') ~-p\ ^} Conf ( CO /3') ~-p\{ g } Conf(C, /?), where the 

3- 


last relation follows from lEci. (7)1 and 1(8)1 . This proves lEa. (5)1 . 

If Case (a) of the inductive hypothesis holds for C’ and a 1 , then it is also true for C and a because 
Conf(C', a) = Conf (C',a'). Now suppose either Case (b) or (c) applies to C' and a'. Let R be the 
object process q is poised to access in Conf(C, a) = Conf(C", a'). If R / V, then by construction 
we have |WCov(C,P) D P\ = |WCo v(C,R) n P'\ and |CCov(C,P) C P\ = |CCo v(C',R) D P'\. 
Hence, Case (b) or (c) for C, u' , and P' immediately implies the same case for C, a, an P. Finally, 
suppose Case (b) or (c) occurs for R = V. By construction in Exec(C', A) only one process among 
all processes in WCov(C, V) writes to V, namely process zg. Hence, in the configuration C' reached 
by Exec(C, A) all other processes in WCov(C, V) are still poised to write to V. Thus, for R = V we 
obtain |WCov(C", R) fl P'\ = |WCov(C, R) fl P\ — 1 > t — 1 = t', so neither Case (b) nor Case (c) 


can apply to C 1 , a 1 , and P' —contradiction. 


□ 
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sig(Ci) = sig(Cj) 



Figure 2: Proof of Lemma 3l . Double circles denote quiescent configurations. 


Lemma 3. Suppose WeakReadO and WeakWriteO have step complexity at most t. For any reach¬ 
able quiescent configuration Q and any set Pk = {pi, • •. ,Pk} CP \ {0}. where k £ {0,..., n — 1}, 
there exists a (Pk U {0 })-only schedule a such that C := Conf(Q, a) satisfies all of the following: 

(i) all processes in V \ Pk are idle in C; 

(ii) there is a Pk-successful schedule for C; and 

(Hi) |WCov(C, R ) PI Pk | < t and |CCov(C', R ) fl L\,| < t for all objects R. 

Proof. Throughout this proof let Pj) denote the set Pk U {0}. We prove the lemma by induction on 
k. For k = 0, we let a be the empty schedule, so C = Conf(Q,a) = Q. Then C is quiescent and 
(i) is true. Statements (ii)-(iii) follow immediately from Pk = 0. 

Now suppose the inductive hypothesis is true for some value of k 6 { 0 ,..., n — 2 }. We let Qo = Q 
and Pk +1 = -PfcU{pfc + i} for an arbitrary process Pk+i £ 'P\-P{ ) . Then, for i = 1 , 2 ,... we iteratively 

construct executions cti,/ 3 j, 7 j and configurations Qi,Ci,Di , where Qi-\ Ci D t -w Q,. and 
a t . fii , 7 ^ are determined as follows: cc, is a P^-only schedule that guarantees properties (i)-(iii) 
from the inductive hypothesis for configuration Qj_i; fii is a P^.-successful schedule for Cp, and 
7 i is an arbitrary Ffc-only schedule followed by a 0 -only schedule such that Qi is quiescent, and 
where Exec(Dj, 7 j) contains exactly one complete WeakWriteO operation by process 0. By the 
assumption that WeakReadO and WeakWriteO are wait-free, 7 j exists. 

We define for each configuration Ci a signature, sig(Cj), which encodes for every process p the 
exact shared memory operation p is poised to execute next (including its parameters), and for every 
base object R its value. 

Since there is only a finite number of bounded base objects in the system, there is a finite number 
of signatures, and thus there exist 1 < i < j such that Ci and Cj have the same signature. We 
let A = ai/ 3 i 7 ia 2 ... aj_i/ 3 i_i 7 j_i, and A' = yiW+iA+P/i+iW+i ■ ■ ■ From the 

construction above we have Q Qi- 1 ^ Ci -w Di ■w Qj-i Cj, where Qi-\ and Qj-\ are 
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quiescent, Cj satisfies (i)-(iii) from the inductive h ypothesis, and sig(Cj) = sig(Cj). This situation, 
as well as the fol lowing; con struction is depicted in lFigure 2 . 

Now we apply Lemma 2 I to configuration Cj. For the purpose of applying this lemma, we may 
assume that in Cj process Pk+i has just invoked a WeakReadO operation but not yet executed its 
first shared memory step during that operation. Hence, in all executio ns starting from Cj, Pk+i will 
finish that pending WeakReadO operation in at most t steps. Then iLcmma 2l yields a Pk+ i-only 
schedule dj such that for D\ = Conf (Ci, a ,;) 


D; 


'P\{pk+ 1 } 


D'. 


(9) 


and one of the Cases (a)-(c) of lLenmia 2l hold. Let Cj = Conf(Dj, X'a.j), and Dj = Conf(Cj, /%). 


(The use of /% instead of (3j is intentional.) Then Qi-\ -A Ci - A D- 


\'a n 


not contain Pk+i, which is the only process that, according to lEo. (9) 
Dj from Dj, we obtain 


c i" 


Dj. Since A 'aj does 


C' 


r P\{pk+ 1 } 


c 


3 ‘ 


may be able to distinguish 

( 10 ) 


Configurations Cj and Cj have the same signature. Therefore, every process is poised to execute 
the same step in Cj as in Cj, and all objects have the same values in both configurations. This, 
together with the fact that /3j is P^-only and every process appears at most once in /3j implies 


Exec (Ci,j3i) = Exec(C-j.JJi) ® Exec(Cj,/3j). 


( 11 ) 


He nce, all objects have the same value in Dj = Conf(Cj,/3j) as in D'- = Conf(C'-,/3j), and thus 
from IEo. (9)| , all objects have the same value in D\ as in D'-. Since Pk+i does not appear in X'otjfii, 
and thus takes no step in the execution leading from D[ to D'-, we conclude 


D' 


Pk +1 


D'. 


( 12 ) 


Now recall that ai is the schedule a guaranteed by Lemma 3 (applied with C = Ci and q = Pk+i), 
and the claim guarantees one of three Cases (a)-(c). First, assume Case (a) occurs, i.e., Pk+i 
completes a WeakReadO method call in Exec((7j,dj) (recall that in Ci it had just invoked that 
method call) and is idle in D\ = Conf(Cj, dj). Since process 0 takes no steps in Exec(Q,dj) it 
follows that D\ is 7 ^+ 1 -clean. On the other hand, Exec(D(, X'ajPi) contains no steps by Pk+i, but 
instead a compl ete WeakWrite Q by process 0. Henc e, D[, = C onff D'„. X'ajfC ), is pfc+i-dirty. But 


this contradicts lObservation ll . because accor ding tolEa. (12j process Pk+i cannot distinguish D 
from Dj. Hence, we know that Case (a) from [Lemma 2l cannot apply. 

Now, suppose that instead Case (b) or (c) applies. We show that statments (i)-(iii) of the lemma 
are true for a = AajdjA'pj and C = Conf(Q , a) = Cj. By the inductive hypothesis (i), in Cj all 

| it follows that in Cj all processes in V \ Pk+i are idle. 


processes in V\Pk are idle, so from l 
This proves (i). 

According to Cases (b) and (c) of Lemma 3 . in configuration D[ (and thus also in Cj and D'-) 
process Pk+i is poised to execute an operation op that is either a Write() or a CAS(x ; y) on some 
objec t R*. Mo roever, in case that op is a CAS(x,y), in configuration Dj object R* has value x. 


Then lEa. (12)1 implies that the value of R* is also x in configuration D'-, and in partiular, if Pk+i 
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executes CAS(x,y) in that configuration, that CAS() succeeds. We conclude that in the execution 
Exec o pk+i) process Pk+i takes exactly one step, which i s either a Write () or a successful 
CAS ix,y). By construction, Exec(Ci,/3j) is Pfc-successful, and so lEa. (1 111 implies that Exec 
is also Pfc-successful. It follows that Exec(C'-, /% o pk+i) is P^+i-successful, which proves statement 
(ii). 

Finally, since in Cj process Pk+i is poised to execute operation op on R*, and all other 
processes are poised to execute exactly the same step as in we have: In case op is a 
WriteO, WCo v(Cj,R*) = WCo v(Ci,R*) U {pk+ 1 }, and in case op is a CAS(), CCo v(Cj,R*) = 
CCo v(Ci,R*) U {pk+i}- All ot her sets W Covf-, •) and CCov(-,-) are the same for Q as for Cj. 
Therefore, Cases (b) and (c) of Lemma 2 I together with the inductive hypothesis (iii) immediately 
imply WCov(Gj, R) D Pk+i\ < t and |CCov(Cj, R) fl Pk+i\ < t for all objects R. This proves (iii) 
and completes the inductive step. □ 


of iTheorem ll follow immediately from this lemma. (See lAppendix B.2l for 
Replacing each WeakWriteO with a LL()/SC() pair by process 0, and ac- 


Parts (b) and (c 

additional details.) iiepiacmg eacn weaxwnteu witn a llu/suu pair Dy process u, a na a 
commodating the definition of p-clean and p-dirty configurations, we obtain ICorollarv 2 (S' 

Appendix B.bI for additional details. 


3. Upper Bounds 


3.1. Constant-Time ABA-Detecting Registers from Registers 


Figure 4 depicts an optimal linearizable implementation of an ABA-detecting register from n + 1 
bounded registers wit h constant st ep-complexity. We use two more registers than needed according 
to the lower bound in Theorem 3(a). 

The main idea of the algorithm is similar to one used in the multi-layered construction of 
LL/SC/VL from CAS by Jayanti and Petrovic [15], which itself is a modified version of the imple¬ 
mentation by Anderson and Moir |2j. 

Here, we briefly discuss the implementation. A complete correctness proof is provided in 
Appendix d We use a shared bounded register X that stores a triple (x,p,s), where x is the 
value stored in the ABA-detecting register, p e P is a process ID, and s E {0,..., 2n + 1} is 
a sequence number. We also use a shared announce array A[0---n — 1], where only process q 
can write to A[q], Each array entry A[q\ stores a pair (p, s), where p € V is a process ID and 
s E {0,..., 2n + 1} is a sequence number. Register X is initialized to (_L, _L, _L) and all entries of 
A are initialized to (_L,_L). 

In a DWrite(x) operation, the calling process p first determines a suit able sequ ence number, 
s, using the helper method GetSeqO, and writes the pair ( x,p,s ) to X ([lines 2(1 1271 1 . Method 
GetSeqO ensures that the sequence number s it returns satisfies the following: If there is any 
point at which X = (•,p, s) and A[q] = ( p,s ) for some process q. then p will not use sequence 
number s again in any following DWriteO call, until A[q] 7 ^ (p, s). To achieve that, in a sequence 
of n consecutive GetSeqO calls process p scans through the entire announce array, reading one 
array entry with each GetSeqO call. It then returns a sequence number that p has not used in its 
preceding n DWriteO method calls, and which it has not found in any array entry of A[], when it 
read that entry last. 
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Method SC„Ct) 


if b then return False 
for i 4 — 1 to n do 

( y , a) t— X.ReadO 
if L a /2 P J is odd then /* if p’s 
bit is 1 */ 

j return False 

if X.CAS((y, a), (x, 2 n — 1)) then 
i return True 


8 return False 


Method VLp() 

9 (x, a) X.ReadO 

10 if (\a/2 p \ is even A b = False,) then 

/* if p’s bit is 0 and b = False */ 

11 | return True 

12 else 

13 return False 


shared: CAS X 
local: Boolean b 


Method LLp() 


14 (x, a) X.ReadO 

15 if \a/2 p \ is even then /* if p’s bit is 
o */ 


16 

17 

18 

19 

20 
21 

22 

23 


b False 
return x 
else 

for i <— 1 to n do 

(x',a') •<— X.ReadO 
if X.CkS((x',a'),(x',a' -2 P )) 
then /* try to reset p’s bit 
b <— False 
return x' 


*/ 


24 b <r- True 

25 return x 


Figure 3: An LL/SC/VL implementation from bounded CAS. 


For its DReadO operation each process q uses a local variable, b, that indicates whether a 
DWriteO operation linearized already during q' s previous DReadO operation after that opera¬ 
tion’s linearization point. Our algorithms ensure that if b = True at the beginning of a DReadO 
operation, then such a DWrite O has h appened. In a DR eadO op eration, a process reads X twice 
to obtain the triples ( x,p,s ) in lline 38, and ( x',yf,s') in lline 4ll . Between those two reads q first 


reads t he old va lue of A[q\ into (r, s r ) ( line 39| ), and then it announces the pair (p, s) by writing it 
to A[q] ( line 40l) . Now (r,s r ) stores the “old” announcement from q’s preceding DReadO operation, 
and A[q] stores the current one. In lines 421 - 45 . q now decides the return value: If b = True or if 
(p,s) 7 ^ (r,s r ), then q returns (a;, True), and otherwise (x, False). Moreover, in preparation for 
the next DReadO, if (x,p,s) = (x / ,p / ,s / ), q sets b to F alse, a nd otherwis e it sets it to True. 

First suppose q reads two different triples from X in lline 381 and lline 4ll. i.e.. (x,p, s) ^ (x',p r , x'). 
Then the DReadO operation will linearize with the first read of X (lline 38l ). We now know that the 
value of X has changed between the linearization point and the response of q’s DReadO. Hence, q 
sets flag b to indicate that its next DReadO should return a pair (-.True). If (x,p,s) = (x',p',s'), 
th en, on t he other hand, it is ensured that A[q] = ( p,s ) at the point when q read ( x,p,s ) from X 


m 


line 4ll . As explained above, in this case the pair (p, s ) will not be used again in any following 


DWriteO operation, until q has replaced its announcement (p,s) with a new one. Hence, q resets 
b because in the following DReadO operation, q will be able to detect any DWriteO that has 
happened inbetween by comparing A[q] with the corresponding pair stored in X. 
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shared 

Register X = (_L, _L, _L) 

Register A[0 ... n — 1] = ((_L, X),..., (X, X)) 

Method DWrite p (x) 

26 s 4 — GetSeqQ 

27 X. Write (x,p,s) 


Method GetSeq p () 

28 

(r, s r ) A[c].Read() 


29 

if r = p then 


30 

| na £- (na \ {(c, i ) 

i £ N}) U (c, s r ) 

31 

else 


32 

]_ na £- na \ {(c, i ) 

i £ N} 

33 

c i — (c X 1) mod n 


34 

choose arbitrary 



s £ {0 ,... , 2n + 1} \ 

({i (j , i) £ na} U usedQ ) 

35 

usedQ .enq(s) 


36 

usedQ. deqO 


37 

return s 



local (to each process) 

Boolean b = False 
Queue usedQ[n + 1] = (X,..., X) 
Set na = {} 
int c = 0 


Method DReadqO 

38 (x,p, s) 4— V.ReadO 

39 (r, s r ) £- A[g].ReadO 

40 A[q] .Write (p,s) 

41 (x',p',s') £- A.ReadO 

42 if (p,s) = ( r,s r ) then 

43 | ret £- (x, b) 

44 else 

45 ret £- (x, True) 

46 if ( x,p,s ) = (x'jp'js') then 

47 | b <— False 

48 else 

49 |_ b £- True 
so return ret 


Figure 4: An ABA-detecting register implemented from bounded registers. 


3.2. LL/SC/VL from a Single Bounded CAS 


We now briefly sketch our wait-free implementation of LL/SC/VL from a sing le bounded CAS 
object. The implementatio n has O(n ) step complexity, and thus, bv ICorollarv ll. is optim al. The 
pseudo-code is presented in lFigure .'ll and correctness proofs can be found in I Appendix D . 

In a CAS object V, we store a pair (x,a), where x represents the value of the implemented 
LL/SC/VL object, and a is an n-bit string. The p -th bit of a is used to indicate whether an SCO 
operation linearized since p’s last LL() (the bit is usually set in this case). As in the previous 
algorithm, we use a local variable b for each process p. In an LL() call, a process p tries to reset 
its bit (the p-th bit of the second component) of X. As we explain below, this may fail, but only 
if an SCO sc linearizes during p’s attempts to reset that bit. If that happens, p sets the flag b and 
its LL() linearizes before sc. Thus, in a subsequent SCO or VL(), p determines from the set flag b 
that it does not have a valid link, and that SCO or VL() can fail, even though p’s bit in X is not 
set. 

More precisely, in a LL() method call, proc ess p reads the pair (x, a) from X ( line 3) and checks 


whether its bit in a is set. If not, in lines id - 17 it simply resets b (because in subsequent SCO or 


VL() calls p’s bit in X will indicate whether p has a valid link or not) and returns x. That LL() 
operation linearizes with the Read() of X in llinc 14 . Now suppose p’s bit in X is set. Then p 
tries to reset that bit, using a CAS() operation on X. However, that CAS() may fail because of 
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some other process’ successful CAS() during a LL() or SCO call. Therefore, p repeatedly reads X 
followed by a CAS() to set its b it in the second component of X, until its CAS() operation succeeds, 
or until it has failed n times ([lines 2d - l2lh . If a CAS() succeeds, p resets b and re turns the first 
component of X that it read just before its last, successful CAS() attempt ([lines 22 - 1231 ): the LL() 
linearizes with that CAS (), and since p’s bit in X is now reset, in the next DReadO operation p can 
use its bit in X to determine whether an SCO linearized since the linearization point of the current 
DReadO. If p’s CAS() fails n times, then X must have changed n times since p’s first Read() of 
X. We argue that then at least one such change must be due to a CAS() operation during some 
process’ SCO: Suppose not. Then X must have changed at least n times, and every time it must 
have changed because of a CAS() executed in a LL() operation. But this is not possible, because 
each time such a CAS() succeeds, one of the bits in the second part of X changes from 1 to 0, and 
p’s bit does not change at all. We conclude that at least once, while p has been trying to reset its 
bit in X, a successful CAS() on X must have occurred during an SCO operation. As we discuss 
below, t his means that a successful SCO linearized. Hence, in this case p can set its bit to True 
( line 2dl ), which guarantees that p’s next SCO or VL() will fail, and return in llinc '2/1 the value x 
it read at the very beginning from X. The linearization point of that LL() is the Read() of X in 
line 1 4l . 


In an SC(y) operation a process p first checks flag b, and if it is set, p immediately returns 
False —this indicates that an SCO linearized during p’s last LL() but after the linearization point 
of that LL(). If b is no t set, th en p reads X to determine whether its bit in X is set, and if yes, it 
can also return False ( lines III -ill, because this indicates that some other SCO has linearized since 
p’s last LL (). If p’s bit in X is not set, then p tries to write (y, 2 n — 1) into X using a CAS() 
operation ( line (I) . If that CAS() succeeds, as a result the value of the LL/SC/VL object change to 
y, and the bits of all proces ses are now set in X. Hence, p’s SC (y) linearizes with that successful 
CAS(), and p returns True ( line j ). If the CAS() fails, then p repeats up to n times, until either 
it finds that its bit in X is set (and thus some other process’ SCO succeeded), or its own CAS() 
succeeds. If p’s CAS() fails n times, then for the same reasons as explained earlier, we know that 
some process’ SCO must have linearized during p’s ongoing SC (p) operation, and thus p can return 
False (the unsuccessful SCO linearizes with its response). 

Operation VL() is very simple: A process simply checks whether flag b or its bit in X is set, and 
if yes, it returns False, otherwise it returns True. 
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A. Implementation of ABA-Detecting Registers from LL/SC/VL 


Theorem 4. There is an implementation of an ABA-Detecting register from a single LL/SC/VL 
object, such that each DReadQ and DWriteQ operation takes only two shared memory steps. 


Proof. It suffices to show that the implementation in iFigurc 5l is linearizable. 

Consider a history H on the implemented ABA-Detecting register. We assume w.l.o.g. that all 
operations in H complete. 

We say a V L() oper ation succeeds , if it returns True. Note that we assumed w.l.o.g. (see the 
description of Figure fil l that the first A.VLO call by a process q succeeds, even if q has not called 
A.LLO, provided that no A.SCO call has been executed, either. Therefore, for the purpose of this 
proof we may assume w.l.o.g. that the history H starts with n complete A.LLO operations, one by 
each process. We say a process has a valid link on A, whenever no successful A.SCO has occurred 
since p’s last A.LLO operation. 

For each DWriteO and DReadO operation op by process p respectively q, we def ine a point in 
time, £(op), as follows. If op is a DWriteO by process p, and p’s SCO in llinc 52l is successful, 
then £(op) is the point when that successful SCO gets executed. If the SCO is unsuccessful, then 
£{oy] is the point immediately before the first succe ssful SC O that gets executed after p’s LL() in 
line 5ll (such an SCO must occur before p’s SCO in lline 52l . because that SCO by p fails). If op is 
a DReadO by q, then £(op) is the poi nt of the last shared memory operation executed by q during 
op (which is either A.VL() in line All or A.LLO in line 5dh . 

We prove below that £() maps each operation op to a linearization point of op; therefore, we 
say that op linearizes at £(op). Each point £(op) occurs between the invocation and the response 
of op. It suffices to show that the history obtained by ordering all operations in H by £(op) is 
valid. Note that every DWriteO operation linearizes either at or immediately before the point of 
some successful SCO. Therefore, at any point the value of A is equal to the value of the DWriteO 
operation that linearized last. 

Now consider a DReadO operation op by process q. Initially, the value of old equals the value 
of A, and q has a valid link on x. It follows from the structure of the DReadO operation that 
the following invariant is maintained: At any point throughout H, q has a valid link on A if and 
only if no successful A.SCO has been executed on A since q’s last A'.LLO or its last successful 
A.VL(), whichever came later. Since a DWriteO operation linearizes at some point t if and only 
if a successful A.SCO operation is executed at point t , and a DReadO operation linearizes with 


shared LL/SC Object X = _L 


local old = _L 


Method DWrite p (x) 


Method DRead g () 

51 A.LLO 


53 if A.VLO then return (old, False) 

52 A.SC(z) 


54 old := A.LLO; return (old. True) 


Figure 5: Implementation of an ABA-detecting register from LL/SC/VL. For the ease of descrip¬ 
tion but w.l.o.g. we assume that, even if q has not called A.LLO, an A.VLO call by 
process q returns True as long as no successful A.SCO has been executed. 
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its successful A.VLO or, in case of an unsuccessful X.VLO, with its ALLLQ, we obtain: Process q 
has a valid link on X if and only if no DWriteO has linearized since q's last A.DReadO operation 
linearized (or since the beginning of the history, if none of q 's DReadO operations have linearized, 
yet). _ 

Now suppose op linearizes with the X.VL() operation in lline 53l . i.e., that VL() operation is 
successful. Then q returns (old, False). The second component, False, is correct because at t(op ) 
process q has a valid link on X, i.e., no DWriteO operation has linearized since the linearization 
point of q 1 s preceding DReadO operation (or since the beginning of the execution, if op is q 1 s first 
DReadO operation). The first component, old, is also correct, because q has a valid link on X, 
which means that the value of X cannot have changed since q 's last LL() in which it obtained the 
value of old from X (or since the beginning of the execution, when X = old = _L). 

Finally, su ppose o p linearizes with the X.L LQ ope ration in lline 54j , i.e., the preceding X.VLO 
operation in lline 53l fails. Then op returns in lline 54 Moreover, q has no valid link at the point 
of that X.VL(), and thus also not immediately before £(op). Hence, a DWriteO operation has 
linearized since the linearization point of g’s preceding DReadO operation (or since the beginning 
of the execution), and so the second component of the return value, True, is correct. The first 
component of the return value is the value of X at t(op), which is also correct. □ 


B. Additional Details on the Lower Bound Proofs 


B.l. Proof of lObservation ll 


Proof. For the purpose of a contradiction, suppose C\ ~ p Co. Let c^, i € {1,2}, be the schedules 
such that Ci n n Ci, and 

• in E i := Exec(C'j n jt, ot\) p executes at least one complete WeakReadO, and the last one, r*, 
happens after any WeakWriteQ; and 

• in E 2 := Exec(Ci n it, 02 ) there is a complete WeakWriteQ w* (by process 0) that overlaps 
with no WeakReadO by p, and p invokes no WeakReadO after w*. 

By the nondeterministic solo-termination property, there exists a p-only schedule A, such that in 
Exec(C 2 , A) process p completes exactly one WeakReadO method call r. Then in E 2 o Exec(C 2 , A) 
the WeakWriteO w* happens before r and after any preceding WeakReadO by p. Therefore, by 
the specification of WeakWriteQ and WeakReadO, operation r returns True. 

Since C\^ p C 2 , process p also completes its WeakReadO r with return value True during 
Exec(Ci, A). But since C\ is p-clean, there is now a complete WeakReadO operation r* by process 
p that precedes r in Exec(Cj n jt, a\ oA), and any WeakWriteO operation happens before r*. By the 
specification of WeakWriteQ and WeakReadO, operation r returns False —a contradiction. □ 


B.2. Additional Details on the Proof of Theorem 1 


Parts (b) and (c ) of Theorem 1 follow almost immediately from Lemma as follows. 

By Lemma J, for P = {1,... ,n — 1}, there is a reachable configuration C which has a P- 
successful schedule, and for every object R, |WCov(C, R) fl P\ < t and |CCov(C', R) fl P\ < t. 
Hence, if the implementation uses m writable CAS objects, then in C at most 21 process are poised 
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to access each CAS object, and thus 

2 mt >77. — 1. 

Similarly, if each of the m base objects supports, in addition to Read(), only one of the two 
operations, CAS() or Write(), then in C at most t processes are poised to access each object, and 
so 

mt >n — l. 

The result now follows immediately by solving for m. 


B.3. Lower Bounds for Implementations of LL/SC 


We can easily modify the lower bounds for implementations of ABA-detecting registers to obtain 
the same lower bounds for implementations of LL/SC objects from bounded CAS objects and 
registers. 

Consider a linearizable implementation of the methods SCO and LL(). Given a process p £ 
V \ {0}, we define p-clean and p-dirty configurations in almost the same way as for WeakReadO 
and WeakWriteO: Configuration C is p-clean, if there exists a schedule a, C'j m t C, such that 
Exec(Q n it,a) contains a complete LL() operation r* by p, and any (successful or unsuccessful) 
SCO happens before r*. Configuration C is p-dirty , if there exists a schedule a , Ci n u C, such 
that Exec(Ci n it,a) contains a successful SCO w*, and no LL() by p is pending at any point after 
w* has been invoked. 


We observe analogously to lObservation 1 


Observation 2. Suppose the LL() and SCO methods satisfy nondeterministic solo-termination. 
For any process p £ V \ {0} and any two reachable configurations Ci,C 2 , if C\ is p-clean and C 2 
is p-dirty, then C\ fi> p C 2 ■ 


Proof. For the purpose of a contradiction, suppose C\ ~ p C 2 . Let ai, i £ {1,2}, be the schedules 
such that Ci n it ~-+ Ci, and 

• in E\ := Exec (Cinit, a\) p executes at least one complete LL(), and the last one, r*, happens 
after any SC(); and 

• in E 2 := Exec(Cj n it, 02 ) there is a complete successful SCO w* that overlaps with no LL() 
by p, and p invokes no LL() after w*. 

Then process p must be idle in configuration C 2 , and thus also in configuration Ci. The im¬ 
plementation of LLO/SCO must remain correct, if starting in configuration Ci respectively C 2 
process p executes a SC (p) call, where y is an arbitrary value. In a long enough p-only execution, 
that SC (p) call must complete, but since Ci and C 2 are indistinguishable, the resulting p-only 
executions starting in Ci respectively C 2 must be the same. I.e., there is a p-only execution E such 
that E\ o E and E 2 o E are also executions, and in E process p completes exactly one SCO method 
s*. In Ei o E, all SCO operations except for s* terminate before p’s last LL(), r*, and that LL() is 
followed by p’s SCO, s*. By the semantics of LL/SC, s* succeeds. On the other hand, in E 2 ° E a 
successful SCO w* happens before p’s SCO s* and no LL() by p is pending at any point after the 
invocation of w*. Hence, s* fails, which is a contradiction. □ 


Now, in the proofs of Lemmas 1 and0, we can simply replace every occurrence of WeakReadO 
with LL(), and every occurrence of WeakWriteO with a LL() followed by an SCO. With those 
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replacements, every step of the proofs of Lemmas 1 and 0 holds vacuously. (Observe, that in the 
execution constructed in the original proofs, only process 0 calls WeakWrite ( ), so if we make the 
replacements as described, each SCO must succeeds.) This yields Corollarvo] . 


C. Proof of Theorem 3 


T o prove Theorem 3l it is enough to show that the implementation of the ABA-detecting register 


m 


Figure 4 is linearizable. 


In every line of the code, at most one shared memory operation is executed. Consider some 
history H in which processes execute DReadO and DWriteO operations. For any operation m by 
process p and any line number k, we let t r F denote the point in time when p executes its shared 
memory operation in line k during m. Further, inv(m ) and rsp{m) denote the points in time of 
the invocation respectively response of m. We say operation m completes in some interval [t. t'] , if 
[inv(m),rsp(m)\ C [t,t']. 

Consider some history H on the ABA-detecting register. We define the linearization point £{m) 
of each ope ration m as follows. A DWriteO operation dw in H linearizes when the value of X is 
updated in line 271 of dw (i.e. £{dw ) = fj^yj). For a DReadO operation dr by some process q, we 
define £(dr) = ^^jif b = True at rsp(dr ), and otherwise £(dr) = igjj Clearly the linearization point 
of each operation is between the invocation and response of the operation. It remains to show that 
the history Sh obtained by ordering all operations by their linearization points is valid. For that, 
we first prove the following auxiliary claims. 


Claim 1. F or every complete DReadO operation dr in which process q reads (x,p, s) in line and 


{x',p', s') in uine A A. if b = True at rsp(dr), then some process writes to X during [£(dr),rsp(dr)j, 
and otherwise at£(dr), we have A[q] = (p,s) = ( p',s') and (x,p,s ) = (x',p',s'). 

Proof. First suppose b = True at rsp(dr), and thus £(dr) = This implies t hat q e xecutes 

line 49 of dr, and so (x,p,s) ^ (x',p',s'). I.e., q reads different triples from X in line 38l and in 
line 4ll . Therefore, a process writes to register X during Q [£(dr),rsp(dr)\. 

Now suppose b = False at rsp(dr), and thus £(dr) = It also implies that q executes llinc 47 


of dr and hence (x,p, s ) = ( x',ps'). Process q writes the pair (p, s) to A\q\ at t^fj^ and it does not 
change it before t^\ Thus, A[q] = (p. s ) = (p', s') at t^\ = £(dr). □ 

Claim 2. Consider two GetSeqO operations gs\ and gs 2 by the same process p, which both return 
the same value s. Then p completes at least n GetSeqO calls between gs i and gs 2 - 

Proof. This follows fr om the fact that before a process returns a sequence number s in a GetSeqO 
call, it enqueues it in lline 35l in the queue usedQ of length n + 1. After that, according to llinc 34 
it does not choose s again until s has been removed from the queue, and in every GetSeqO call 
only one element gets dequeued (in lline 3 (il ) . □ 


Claim 3. Suppose X = (x,p, s) at some point t for some triple (x,p, s), and A[q] = (p, s) throughout 
[t,t'], for some t' > t. Then, p does not write ( x',p,s ) into X during (t,t'], for any value of x'. 
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Proof. As X = ( x,p,s ) at t, p must have written that triple to X before t in a DWrite(x) call. 
Let gs be the GetSeqQ operation that p executed during that DWrite(x) call. Then gs responds 
before t and returns s, and p can complete at most one additional GetSeqQ operation after gs 
and before t (this may happen during its following DWriteO call, just before it writes to X again). 
Note that not more than one GetSeqQ operation can be invoked after gs and bef ore t. 


line 27l of 


some 


line 26 of 


Suppose for the sake of contradiction that p writes ( x',p,s ) to X during (t,t'} in 
DWriteO operation, for some value x'. Thus, p completes a GetSeqQ operation (in 
the same DWriteO operation) during [rsp(gs), t'], such that it returns s. Let gs' be the first such 
GetS eqQ ope ration. 

Bv IClaim 2 . p completes at least n GetSeqQ operations gs 1 ,..., gs n , executed in the same order, 
during [rsp(gs), inv(gs')]. As at most one GetSeqQ operation can respond after gs and before t, 
only gs\ can get invoked and reponsd during [rsp(gs),t\. Thus, 


gs 2 , ■ ■ ■ ,gs n ,gs' all complete, in the same order, during [t, t'}. 


(13) 


As p increments its local variable c by 1 modulo n during each GetSeqQ operation, c = q at the 
invocation of some GetSeqQ opera tion gs" £ {gs 2 , ■ ■ ■ ,gs n ,gs'}. By the assumption, A[q] = ( p,s ) 
throughout [t. t']. and therefore, bv lEq. (13)1 . A[q\ = (p,s) throughout the execution of gs”. Thus, 
p reads (p, s) from A[q] and adds (q,s) to its set na in lines 12811301 of gs”. Process p only removes 
(q,s) from na, when it reads a new pair ( p,s '), for s' 7 ^ s, from A[q] (lines I29ll32l) . hence, (q,s) 
remains in p’s set na until some time after the value stored in A[q] changes, which is after t'. 


Therefore, no GetSeqQ operation by p that executes llinc 34 during [ffqfn , t'] returns s. But by 


Eg- %a\ e i%n\ ■ > because gs” G { gs 2 ,... ,gs n ,gs'} and gs' is the last operation executed in 


this set. Hence, gi> does not return s, which contradicts the assumption. 


□ 


Claim 4. Consider two consecutive DReadQ op erations dr\ and dr 2 by the same process q. Suppose 
b = False at inv(dr 2 ), and the if-condition in Uine of dr 2 evaluates to True. Then no process 
writes to X during [£(dri),l(dr 2 )\. 

Proof. Let (x±,pi,si) and (x 2 ,P 2 ,S 2 ) be the triples that pr ocess q reads from X in line 38l of dr\ 
respectively dr 2 - As the value of A[q\ is only modified in lline 4Q of a DReadQ opera tion by q, 


A[q] = (pi,si) throughout (4^1)4^]) an d so ( r,s r ) = (pi,s\). Since the if-condition in llinc 42l of 

dr2 evaluates to True, (pi, sij= (^, S2). So let p = Pi = P2 and s = si = S2- 

,dx2 

So 

w- have 

A[q\ = (p,s) throughout [^, rsp(dr 2 )] 


Then process q writes (p, s ) to A[q] at both and ^y| , and since A[q) is not changed elsewhere, 

(14) 


Suppose for the sake of contradiction that some process writes to X during [l(dri), i(dr 2 )\. As 
q reads (x 2 ,p,s) from X at the last write to X during interval [l(rfri), must be by p 
and it must write triple {x 2 ,pfs) to X. Process q does not change the value stored in b during 
[■ rsp(dri),inv(dr 2 )], and by the assumption b = False at inv(dr 2 ), thus, 


b = False at rsp{dr\). 


Thus, by the definition of £ we have 


£{dn) = 


dri 


(15) 


(16) 
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Eg. (T5)l also implies that if-condition in line id of dr\ evaluates to True. Thus, p reads the 
triple (xi,p, s) from X at By (HU), we have A[q] = ( p,s) throughout rsfathfa], and 

so by Claim [3] p does not write ( x',p,s ) to X, for any value of x', during 
[£(dri),rsp(dr 2 )], and so not during interval \£{dr\), —a contradiction. 

Claim 5. Co nsider tw o consecutive DReadO operations dr\ and dr 2 by some process q. Suppose the 
if-condition in lline J/A of dr 2 evaluates to False. Then a process writes to X during [£(dri),£(dr 2 )\- 


% rsp(dr 2 )\ = 

□ 


Proof. Suppose process q reads some values (xi,pi,si) and (fa, £> ■?, s?) fr om X in line 38l of dr\ 
respectively dr 2 - Register A[q] can only be modified by q and only in line 40l of a DReadO operation, 
so A[q\ = fa, si) th roughout (jj^, and so (r, s r ) = fa,si). By the assumption that the if- 
condition in lline 42l of dr 2 evaluates to False, fa,si) 7 ^ ( P 2 ,sz )■ Hence, the value ( X 2 ,P 2 ,S 2 ) gets 
written to X during If £{dr\) = ij^j, then the claim follows, as l{dr 2 ) is either or 

J.dr 2 

m 

Now suppose £(dri) = 4^i. By the dehnition of £, b = False at rsp(d r^). Hence, a execu tes 
line 47 . and thus it readstne same triple (xi,pi,si) from X in line 41 . as it did in line 38l of 


dr 1 . Suppose for the sake of contradiction that no process writes to X during fadfafaefa)] = 
(ITT)’ £(dr 2 )] =1 [/pjj|, ■ Then X remains unchanged throughout that interval, and in particular 

at i™ process q reads {x\,p\,s\) from X, and so {x\,p\,x\) = ( X 2 ,P 2 ,S 2 )—a contradiction. □ 


38] 

Now, we prove that sequential history Sh is valid. Consider the first DReadO dr by some process 
q. If no DWrite () linearizes before t^j j , then X has its initial value, (_L, T, _L), from the beginning of 
the execution until tnfj 1 Hende, q reads that triple from X in line 38l and also in line 41 . and so the 
if-condition in lline 46l evaluates True, and d returns (T, False). Since £{dr) is before tnfr 1 , and thus 
before any DWrite () linearizes, this return value is correct. Now suppose some DWrite (T operation 
linearizes before trfjy and the last such operat ion uses parameter x*. If that happens bef ore frfei 
then q reads a tripTfaxfa, s) from X in lline jjj, where (p,s) ± (±, T). But when q executes lline 3H . 
A[q] = (_L, T), so q assigns ret the value (x*,True) in lline 45[ , and thus dr later correctly returns 
that pa ir. If the hrst DW rite () operation linearizes in then q reads (X, J_ L) f rom X in 

1 1 Q Ql li /-l ( m ^ nnJ \ l vi 1 1 /II tt n\ v* /~\ (\ —/■ ( I I ^ t~*r\ 4- h rt 1 1* /-I -i 4- i /-vn l 1 i /~\ 1 O /-it rn In nl-rvn 


line 38l . and (. x*,p' , s') in lline 4ll . where fa, s') ^ (T, T). "Hence, the if-condition in lline 42l evaluates 


to False, and dr returns correctly (x*,True). 

Now suppose dr is a DReadO by q , but it is not the hrst one. For ease of notation, we write cfa 
instead of dr, and we let dr\ be the DRe adO by q immediately preceding dr2- Let ( x*,p*,s *) be 
the triple that q reads from X in lline 38l of dr2 , and so ret = (x*, g) is the return value of dr2 , for 
some g € {True, False}. To prove that Sh is a valid history, we show that 

(a) X = (x*, -, •) at £(cfa); and 

(b) g = True if and only if a DWrite () linearizes between £(dr 1) and ^(cfa)- 


Proof of (a). By dehnition, either £(cfa) = or £{dr 2 ) = /p-fa If ^(cfa) = then (a) is 
immediate, because q reads (x*,p*,s*) from X in that line. If £(cfa) = th en accor ding to the 


dehnition of £(), we have b = False at the respon se of dr y,. Hence, q executes lline 47l of efa, and 
thus it reads the same triple (x*,p*, s*) from X in lline 4ll . It follows that X = (x*, -, •) when that 

lr 2 

EH’ 


happens, i.e., at ^(cfa) = + — 
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Proof of (b). First suppose g = False. This imp lies that lline 43l is executed during dr 2 , and 
b = False at the invocation of dr^- Th us, by Claim 4 . no process writes to X during [£(dri),£(dr 2 )]. 

Now suppos e g = True. Then either line 43l of dro is executed and b = True at the invocation of 
dr 2 , or line 45 of dr y, is executed. In the latter case, the if-condition in llinc lil evaluates to False, 
and so by IClaim 5l a process writes to X during [£(dri),£(dr 2 )]. Hence, consider the case that 
b = True at the invocation of dr 2 . Since g’s local variable b does not ch ange betw een consecutive 
DReadO method calls by q , we also have b = True at rsp(dri). Hence, bv IClaim ll . a process writes 
to X during [£{dr\),rsp(dr{)\ C \£(dr\),£(dr 2 )\- 


D. Proof of Theorem 2 


In this section, we prove ITheorem 2l by showing that the implementation of LL/SC/VL object 
using CAS given in Figure 3l is linearizable. Let inv{m ) and rsp(m ) denote the points in time 
of the invocation respectively response of some operation m. First we show that if a process p 
executes n unsuccessful consecutive CAS() operations during a LL() or a SCO operation, then at 
least another process executes a successful CAS() during its SCO operation while the first process’ 
CAS() operations fail. 

Claim 6. Suppo se a process p execute s n consecutive unsuccessful CAS() operations c\,...,c n all 
either in line 21 of a LL() or in line d o f a SCO . Then duri ng the time interval I that starts when 
p reads X for the last time before c\ (in line 21 , resp ectively line A ), and ends when p finishes c n , 
another process executes a successful CAS() in {fine a of a SCO operation. 

Proof. Let r* be the ReadO operation p executes just before it executes c* (in line 20 . or line -ill . 
Operation q fails if only if a process executes a successful CAS() operation between p’s r* and c*. 
As all ci, ..., c n fail, n successful CAS() operations must have happened during interval I. 

Now suppose for the sake of contra diction, that none of these n successful CAS() operations 
during I was due to a CAS() in lline Cj . Hence, all n successful CAS() operati pns dur ing I are due 
to a CAS() in|linc_2l|. Each successful CAS() operation by some process q in lline 2ll resets g’s bit 


in the second component of X to 0. The second compo nent of X has n bits, and each of these n 
bits can change to 0 at most once, as no CAS() in llinc 6l succeeds to change any of these bits to 1. 
Moreov er, none of p’s CAS() operations are successful. Hence, at most n — 1 successful CAS() in 
line 2ll can be executed during I —contradiction. □ 


To prove ITheorem 2 . it suffices to prove that any history H on the implementation of the 
LL/SC/VL object given in Figure is linearizable. For each operation m, we define the lineariza¬ 
tion p oint of m, £{m\ as follows. For an unsuccessful SCO operation sc (i.e. it returns False in 
any of llines lj Isl and la), we define t{sc) = rsp(sc). A succe ssful SC O operation sc (i.e., it returns 
True in llinc d l linearizes at the point at whi ch its C AS () in line 6 succeeds. For a VL() operation 
vl by s ome process p, if it returns False (in line id ), the n £(vl) = rsp{vl). If vl returns True (in 
line 111 ), then vl linearizes at the point when p reads X in llinc fll of v l. For a LL ( ) operati on Id by 
so me proc ess p, we let £(l d ) be th e point at which p executes line 1 ll if Id retur ns in ei ther line 17 . 
line 25 . If Id returns in llinc 2iil . then £{ld) is the point at which its CAS() in llinc 2ll succeeds. It 


or 


is not hard to see that the linearization point of each operation is between its invocation and its 
response. It only remains to show that the sequential history Sh obtained by ordering operations 
in H by their linearization points is valid. For that we first prove the following auxiliary claims. 
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Claim 7. Consider some LL() operation Id by some process p, such that at rsp{ld), process p’s 
bit in X is set or b = True. Then-and-only-then some successful SCO operation linearizes during 
( £(ld),rsp(ld)\. 

Proof. First we prove the if-then statement. The val ue of the local variable b is updated during 
each LL() operation, just before the operation returns ( lines 16 . 22l and 24 ). Firs t consider the case 
at which b = True at rsp(ld ). This case can onl y happen if Id returns in lline 25 . In this case all n 
CAS() operations in line 2ll are unsuccessful. By Claim fil . some process executes a successful CAS() 
operation during a SCO operati on sc, while p executes its n unsuccessful CAS(). As in this case, 
£(ld) is when p executes llinc 14 operation sc linearizes at its successful CAS() operation during 
(£(ld),rsp(ld)\. 


Now, suppose b = False, but p’ s bit in X is set at rsp{ld). If Id returns in lline 171 . then p’s bit 


in X is not set when p reads X in lline 14 at £{ld). However, by the assumption, p’s bit in X is set 
when Id responds. This bit is only set when a CAS() operation succeeds during a SCO operation. 
Hence, a process executes a successful CAS() duri ng a SC O operation (and thus its SC O linea rizes) 
during {£(ld),rsp{ld)\. If operation Id returns in line 23 . p’s last CAS() operation in llinc 21 of Id 


must have been successful, and so p’s bit in X must have changed to 0. But by the assumption, 
p’s bit is set at rsp(ld), so some other process must have changed it back to 1 after p’s successful 
CAS(). As £(ld) is when p’s CAS() succeeds, the value of p’s bit changes after £(ld) and before 
rsp(ld). Recall that p’s bit is only set when a process executes a successful CAS operation during 
a SC(). Therefore, some process must have executed a successful CAS() operation during a SCO 
and thus its SCO linearized during (£(ld), rsp(ld)]. 

Now we prove the only-then statement. For that we show if p’s bit in X is not set and b = False 



Id in this case. Hence p’s bit has value 0 throughout (£(ld), rsp(ld )]. As all processes’ bits in X 
change to 1 when a SCO linearizes at its successful CAS() in line dl . no successful CAS() of a SCO 
happens throughout (£(ld),rs p(ld) ]. 


Now suppose Id returns in llinc 2.1 In this case Id linearizes when its successful CAS() changes 


p’s bit to 0. Process p’s bit is not set at rsp(ld) and as p does not try to change the value of its bit 
after its successful CAS(), and thus after £(ld), p’s bit has value 0 throughout (£(ld),rsp(ld)], and 
so with the same argument as before, no SCO linearizes at its successful CAS() operation in this 
interval. □ 


Claim 8. Consider a successful CAS() o peration cas in lline n of a SCO operation sc and the last 
ReadO operation r executed before cas in lline A of sc. Then no successful SCO linearizes between 
r and cas. 

Proof. Let p be the process which executes sc, and ( y*,a*) be the value p reads from X when it 
executes r. As cas succeeds, the if-condition in lline 4l cannot be evaluated to True. Hence, p’s bit 
in X must be 0 when p executes r, and so a* ^ 2 n — 1. Moreover, since cas is successful, the value 
of X is ( y*,a *) just before cas is executed. Suppose for the sake of contradiction that at least one 
successful SCO operation linearizes between r and cas. Note that the value of X is updated at the 
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linearization point of a successful SCO operation. Thus, the last successful SCO executed between 
r and cas must update the value of X to ( y*,a *). However, a successful SCO operation changes 
the second component of X to 2 n — 1, and so a* = 2 n — 1—contradiction. □ 

Claim 9. Consider a SCO operation sc by some process p and let Id be the last LL() operation 
by the same process p executed before sc. Then sc is successful if and only if no successful SCO 
operation linearizes between Id and sc. 


Proof. First w e prove the if-then statement. Oper ation sc is successful, if one of its CAS () operations 
c* succeeds in line 6l and so sc returns in line 7 . Hence, b = False at the inv(sc). As p’s local 
variable b is only changed during a LL() operation, 


b = False at the rsp(ld). 


(17) 


Moreover, since sc returns in lline % p reads value 0 from its bit when it reads X in lline 3l the last 
time before cas at some point t. This bit can only be reset in lline 2ll of a LL() operation by p, 
hence, 

p’s bit is 0 throughout [ rsp{ld),t ]. (18) 


Hence by lEa. (171 lEa. (18)1 . and IClaim 71 . no successful SCO 
(£{ld),rsp(ld)\. Moreover, by Eg. 1181 


operation linearizes during 
no successful SCO line arizes thr oughout [ rsp{ld),t\ , as 
otherwise the value of p’s bit would change to 1. Moreover, by Claim 8l no successful SCO lin¬ 
earizes during [t,£(sc)\, as £(sc) is when cas succeeds. Therefore, no successful SCO linearizes 
throughout (£(ld),£(sc)]. 

Now we show the only-if statement is also true, by showing that if sc is not successful, then at 
least one successful SCO operation linearize s betw een Id and sc. There are three cases where sc 
can return. The first case is if sc returns in lline ll and so b = True at inv(sc). Process p’s local 
variable b does not change outside a LL() operation, hence, b = True at the rsp{ld). Bv IClaim 7l . 
a successful SCO operation linearizes during ( £(ld),r sp(ld) 1 C (£(ld),£(sc)\. 

The second case happens when sc returns in line s[ In this case, p’s bit is set when p reads X in 
line .ll for the last time during sc at some point t. Note that £{sc) = rsp(sc) > t. Now suppose p’s 
bit is 0 at rsp{ld). Hence, some process sets this bit with a successful CAS() at the linearization 
point o f a succes sful SCO operation during ( rsp(ld),t] C (. £(ld),£(sc )]. If p’s bit is 1 at rsp(ld ), 
then bv IClaim 71 a successful SCO op eratio n linearizes during (£(ld),rsp(ld)\ C (£(ld),£(sc)\. 

The last case is when sc returns in line 8l . This implies that all n CAS() operations of p during 
sc failed. Thus by Claim d . a successful CAS happens during [ inv(sc),rsp(sc)\ and so a successful 
SCO linearizes during (£(ld),£(sc)\. □ 

Claim 10. Consider a VL() operation vl by some process p and let Id be the last LL() operation 
by the same process p executed before sc. Then vl returns True if and only if no successful SCO 
operation linearizes between Id and sc. 

Proof. First we prove the if-then statement. Operation vl returns True, hence, b = False at the 
inv(vl). As p’s local variable b is only changed during a LL () operation, 


b = False at the rsp(ld). 


(19) 
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Moreover, since vl returns True, p reads value 0 from its bit w hen it reads X in lline 9l of vl at the 
linearization point of vl. This bit can only be reset in lline 2ll of a LL() operation by p, hence, 


p’s bit is 0 throughout [rsp(ld),£(vl)]. 


( 20 ) 


Hence by lEa. (19J , lEa. (20)1, and Claim 7l . no successful SCO operation linearizes during 
(£(ld),rsp(ld)\. Moreover, by Eq. 120 ll . no successful SCO linearizes throughout [rsp(ld),£(vl)], as 
otherwise the value of p’s bit would change to 1. Therefore, no successful SCO linearizes throughout 
(£(ld),£{vl)]. 

Now we show the only-if statement is also true, by showing that if vl returns False, then at least 
one successful SCO operation linearizes between Id and vl. There are two cases that can cause 
vl to return False. The first case is if b = True at inv(sc). Process p’s local v ariable b does not 
change outside a LL() operation, hence, b = True at the rsp(ld). By IClaim 71 . a successful SCO 
operation linearize s durin g (£(ld), rsp{ld)] C (£(ld),£(vl)\. The second case is when p’s bit is set 
when p reads X in line 9l of vl at some point t. Note that £(vl) = rsp(vl ) > t. Now suppose p’s bit 
is 0 at rsp{ld). Hence, some process sets this bit with a successful CAS() at the linearization point 
of a succ essful SCO operation during (rsp(ld),t\ C (£(ld),£(vl)]. If p’s bit is 1 at rsp(ld), then by 
Claim 7l . a successful SCO operation linearizes during ( £(ld),rsp(ld )] C (£(ld),£(vl)\. □ 


Now we can quickly argue why Sh is valid. Consider some LL() operation Id in Sh that returns 
some value y. For Sh to be valid, register X mus t contai n value y at the lineari zation p oi nt of Id . 
Recall that £{ld) is the po int at w hich p executes lline I ll . if Id returns in either llinc 17l o r llinc 25 . 
Moreover, if Id returns in llinc 2.1 then £{ld) is the point at whic h p’s CAS() in lline 2ll succeeds. 
In the former case, y is the value that p reads from X in lline 14l at £(l d). In t he latter case, y is 
the value that p writes into X during its successful CA S Q opera tion in llinc 21 at £(ld). Hence Id 
returns a valid value. This in addition to the results of Claim 9l and Claim 1 (T complete the proof 
that the resulting history Sh is valid. 
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